5.3 Calculating Least Squares Linear Regression

计算最小二乘线性回归 - 知识点总结

一、核心概念 / Core Concepts

1. 最小二乘法 / Least Squares Method

最小二乘法是一种数学优化技术,通过最小化误差的平方和来寻找数据的最佳函数匹配。

The least squares method is a mathematical optimization technique that finds the best function match for the data by minimizing the sum of the squares of the errors.

目标函数 / Objective Function:最小化残差平方和 Minimize the sum of squared residuals

\[S = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\]

2. 回归直线 / Regression Line

回归直线是对自变量和因变量之间线性关系的数学描述,其形式为:

The regression line is a mathematical description of the linear relationship between the independent variable and the dependent variable, with the form:

\[\hat{y} = a + bx\]

其中,\(a\) 是截距(intercept),\(b\) 是斜率(slope)。

Where \(a\) is the intercept and \(b\) is the slope.

3. 关键统计量 / Key Statistics
  • 离均差平方和 (Sum of Squared Deviations) \(S_{xx}\):衡量自变量离散程度的指标。
    The sum of squared deviations of the independent variable from its mean.
  • 离均差乘积和 (Sum of Products of Deviations) \(S_{xy}\):衡量自变量和因变量协同变化程度的指标。
    The sum of products of deviations of the independent and dependent variables from their means.
  • 回归系数 (Regression Coefficients):包括斜率 \(b\) 和截距 \(a\),描述自变量与因变量之间的关系强度和方向。
    Include slope \(b\) and intercept \(a\), describing the strength and direction of the relationship between the independent and dependent variables.

二、重要公式 / Important Formulas

回归分析核心公式 / Core Formulas for Regression Analysis
1. \(S_{xx}\) 的计算公式 / Formula for \(S_{xx}\)
\[S_{xx} = \sum_{i=1}^{n} (x_i - \bar{x})^2 = \sum x_i^2 - \frac{(\sum x_i)^2}{n}\]
自变量的离均差平方和,衡量自变量的离散程度。
Sum of squared deviations of the independent variable, measuring the spread of the independent variable.
2. \(S_{xy}\) 的计算公式 / Formula for \(S_{xy}\)
\[S_{xy} = \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y}) = \sum x_i y_i - \frac{(\sum x_i)(\sum y_i)}{n}\]
自变量与因变量的离均差乘积和,衡量它们之间的线性关系强度。
Sum of products of deviations of the independent and dependent variables, measuring the strength of their linear relationship.
3. 斜率 \(b\) 的计算公式 / Formula for Slope \(b\)
\[b = \frac{S_{xy}}{S_{xx}}\]
斜率表示自变量每变化一个单位时,因变量的平均变化量。
The slope represents the average change in the dependent variable for each unit change in the independent variable.
4. 截距 \(a\) 的计算公式 / Formula for Intercept \(a\)
\[a = \bar{y} - b\bar{x}\]
截距表示当自变量为0时,因变量的预测值。
The intercept represents the predicted value of the dependent variable when the independent variable is 0.
5. 回归方程 / Regression Equation
\[\hat{y} = a + bx\]
用于根据自变量的值预测因变量的值。
Used to predict the value of the dependent variable based on the value of the independent variable.

三、计算步骤 / Calculation Steps

最小二乘线性回归计算步骤 / Steps for Calculating Least Squares Linear Regression
  1. 步骤1:确定样本量 \(n\)
    Step 1: Determine sample size \(n\)
  2. 步骤2:计算必要的总和
    Step 2: Calculate necessary sums
    计算 \(\sum x_i\)、\(\sum y_i\)、\(\sum x_i^2\)、\(\sum y_i^2\) 和 \(\sum x_i y_i\)
    Calculate \(\sum x_i\), \(\sum y_i\), \(\sum x_i^2\), \(\sum y_i^2\), and \(\sum x_i y_i\)
  3. 步骤3:计算平均值
    Step 3: Calculate means
    \(\bar{x} = \frac{\sum x_i}{n}\),\(\bar{y} = \frac{\sum y_i}{n}\)
    \(\bar{x} = \frac{\sum x_i}{n}\), \(\bar{y} = \frac{\sum y_i}{n}\)
  4. 步骤4:计算 \(S_{xx}\)
    Step 4: Calculate \(S_{xx}\)
    \(S_{xx} = \sum x_i^2 - \frac{(\sum x_i)^2}{n}\)
  5. 步骤5:计算 \(S_{xy}\)
    Step 5: Calculate \(S_{xy}\)
    \(S_{xy} = \sum x_i y_i - \frac{(\sum x_i)(\sum y_i)}{n}\)
  6. 步骤6:计算斜率 \(b\)
    Step 6: Calculate slope \(b\)
    \(b = \frac{S_{xy}}{S_{xx}}\)
  7. 步骤7:计算截距 \(a\)
    Step 7: Calculate intercept \(a\)
    \(a = \bar{y} - b\bar{x}\)
  8. 步骤8:写出回归方程
    Step 8: Write the regression equation
    \(\hat{y} = a + bx\)

四、计算示例 / Calculation Example

步骤 / Step 计算内容 / Calculation Content 结果 / Result
1 样本量 \(n\) \(n = 5\)
2 \(\sum x_i\) 25
\(\sum y_i\) 116
\(\sum x_i^2\) 151
\(\sum y_i^2\) 2928
\(\sum x_i y_i\) 658
3 \(\bar{x} = \frac{25}{5}\) 5
\(\bar{y} = \frac{116}{5}\) 23.2
4 \(S_{xx} = 151 - \frac{25^2}{5}\) 26
5 \(S_{xy} = 658 - \frac{25 \times 116}{5}\) 78
6 \(b = \frac{78}{26}\) 3
7 \(a = 23.2 - 3 \times 5\) 8.2
8 回归方程 / Regression Equation \(\hat{y} = 8.2 + 3x\)

五、重要提示 / Important Tips

计算与应用注意事项 / Notes on Calculation and Application
  • 线性关系假设:确保自变量和因变量之间存在线性关系,否则回归分析可能不适用。
    Linear Relationship Assumption: Ensure there is a linear relationship between the independent and dependent variables; otherwise, regression analysis may not be applicable.
  • 异常值影响:异常值可能严重影响回归系数的估计,应在分析前检查并处理异常值。
    Outlier Impact: Outliers can significantly affect regression coefficient estimates; they should be checked and handled before analysis.
  • 回归直线的性质:回归直线总是通过点 \((\bar{x}, \bar{y})\),这可以作为计算正确性的检查方法。
    Property of Regression Line: The regression line always passes through the point \((\bar{x}, \bar{y})\), which can be used as a method to check calculation correctness.
  • 避免外推:回归方程通常只在原始数据范围内有效,避免对超出范围的值进行预测。
    Avoid Extrapolation: Regression equations are usually only valid within the range of the original data; avoid making predictions for values outside this range.
  • 计算精度:在手动计算时,保留足够的小数位数以确保结果准确性。
    Calculation Precision: During manual calculations, retain sufficient decimal places to ensure result accuracy.